CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity

نویسندگان

  • Ergun Biçici
  • Josef van Genabith
چکیده

We invent referential translation machines (RTMs), a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for judging the semantic similarity between text. RTMs make quality and semantic similarity judgments possible by using retrieved relevant training data as interpretants for reaching shared semantics. An MTPP (machine translation performance predictor) model derives features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of acts of translation involved. We view semantic similarity as paraphrasing between any two given texts. Each view is modeled by an RTM model, giving us a new perspective on the binary relationship between the two. Our prediction model is the 15th on some tasks and 30th overall out of 89 submissions in total according to the official results of the Semantic Textual Similarity (STS 2013) challenge. 1 Semantic Textual Similarity Judgments We introduce a fully automated judge for semantic similarity that performs well in the semantic textual similarity (STS) task (Agirre et al., 2013). STS is a degree of semantic equivalence between two texts based on the observations that “vehicle” and “car” are more similar than “wave” and “car”. Accurate prediction of STS has a wide application area including: identifying whether two tweets are talking about the same thing, whether an answer is correct by comparing it with a reference answer, and whether a given shorter text is a valid summary of another text. The translation quality estimation task (CallisonBurch et al., 2012) aims to develop quality indicators for translations at the sentence-level and predictors without access to a reference translation. Bicici et al. (2013) develop a top performing machine translation performance predictor (MTPP), which uses machine learning models over features measuring how well the test set matches the training set relying on extrinsic and language independent features. The semantic textual similarity (STS) task (Agirre et al., 2013) addresses the following problem. Given two sentences S1 and S2 in the same language, quantify the degree of similarity with a similarity score, which is a number in the range [0, 5]. The semantic textual similarity prediction problem involves finding a function f approximating the semantic textual similarity score given two sentences, S1 and S2: f(S1, S2) ≈ q(S1, S2). (1) We approach f as a supervised learning problem with (S1, S2, q(S1, S2)) tuples being the training data and q(S1, S2) being the target similarity score. We model the problem as a translation task where one possible interpretation is obtained by translating S1 (the source to translate, S) to S2 (the target translation, T). Since linguistic processing can reveal deeper similarity relationships, we also look at the translation task at different granularities of information: plain text (R for regular) , after lemmatization (L), after part-of-speech (POS) tagging (P), and after removing 128 English stop-words (S) 1. Thus, http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RTM-DCU: Predicting Semantic Similarity with Referential Translation Machines

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model effectively judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain spec...

متن کامل

RTM-DCU: Referential Translation Machines for Semantic Similarity

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs judge the quality or the semantic similarity of text by using re...

متن کامل

RTM at SemEval-2017 Task 1: Referential Translation Machines for Predicting Semantic Similarity

We use referential translation machines for predicting the semantic similarity of text in all STS tasks which contain Arabic, English, Spanish, and Turkish this year. RTMs pioneer a language independent approach to semantic similarity and remove the need to access any task or domain specific information or resource. RTMs become 6th out of 52 submissions in Spanish to English STS. We average pre...

متن کامل

Referential Translation Machines for Predicting Translation Quality and Related Statistics

We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the ParFDA instance selection model (Biçici et al., 2015), with additional features for predicting the translation performance,...

متن کامل

RTM at SemEval-2016 Task 1: Predicting Semantic Similarity with Referential Translation Machines and Related Statistics

We use referential translation machines (RTMs) for predicting the semantic similarity of text in both STS Core and Cross-lingual STS. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. RTMs become 14th out of 26 submissions in Cross-lingual STS. We also present rankings of various prediction tas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013